Storing Trees on Disk Drives
نویسندگان
چکیده
Tree-structured data are abundant today, ranging from Bioinformatics suffix-tree alignments, to multi-resolution video, to directory-file hierarchies, to XML. The storage techniques employed by systems that manage tree-structured data greatly affect their performance. Current approaches either map the tree data to an underlying relational database system, or use the abstraction provided by a general-purpose object storage manager, or simply use flat files. These storage schemes, however, ignore the tree structure of the data as well as the characteristics of disk drives. Relational databases are structured tables and flat files are unstructured. On the other hand, disk drives store information in circular tracks that are accessed with mechanical seek and rotational overhead. The performance of disk drives greatly depends on the I/O access pattern (orders of magnitude difference between sequential and random access times). To the best of our knowledge, there exists no data layout strategy that accounts for the structural mismatch between tree-structured data and disk drive storage. We propose a new storage technique, tree-structured placement, that explicitly accounts for the mismatch between treestructured data and disk drive characteristics, so that common navigation operations (parent-to-child and node-to-nextsibling) are efficient. This technique uses the recently proposed idea of semi-sequential disk access [2] to place the tree structure. It also presents optimizations that reduce the on-disk space fragmentation and average random seek-times. Experimental evaluation using the DiskSim disk simulator [1] suggests as much as 80% reduction in query IO times compared to the default sequential layout of tree-structured data.
منابع مشابه
After Hard Drives — What Comes Next ?
There are numerous emerging nonvolatile memory technologies, which have been proposed as being capable of replacing hard disk drives (HDDs). In this paper, the prospects for these alternative technologies to displace HDDs in 2020 are analyzed. In order to compare technologies, projections were made of storage density and performance in year 2020 for both hard disks and the alternative technolog...
متن کاملIncreasing Performance of ext3 with USB Flash Drives
There has been a mass movement in operating systems to Journaling File Systems, such as ext3 and NTFS. Journaling File Systems implement a journal, which stores information on how to update files in the system to make them consistent. Sometimes the journal also stores data before it is written to the main part of the file system. Journaling File Systems such as ext3 originally kept the journal ...
متن کامل12. Algorithmic Approaches for Storage Networks
Persistent storage in modern computers is usually realized by magnetic hard disk drives. They form the last, and therefore the slowest level in the memory hierarchy. Disk drive technology is very sophisticated and complex, making an accelerated growth in disk capacity possible. Nowadays, a single of-the-self disk drive is capable of storing up to 180 GB and this number is doubled every 14 – 18 ...
متن کاملPower-aware Remote Replication for Enterprise-level Disaster Recovery Systems
Electric energy consumed in data centers is rapidly growing. Power-aware IT, recently called ‘green IT’, is widely recognized as a significant challenge. Disk storage is a non-negligible energy consumer. Rather, in light of recent data-intensive systems where a number of disk drives are incorporated, the disk storage may be what we must consider primarily. Yet, all of the disk drives are not us...
متن کاملProtecting Data against Early Disk Failures
Disk drives are known to fail at a higher rate during their first year of operation than during the remaining years of their useful lifetime. We propose to use the free space that normally exists on new disks to minimize the risk of data loss during that first year. Our technique applies to disk arrays that mirror their data on two disks. Whenever a disk fails, the array will reorganize itself ...
متن کامل